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(54) IMethod and apparatus for partitioning resources within a computer system 



(57) A method and system are provided that parti- 
tion computer system resources between concurrently 
executing workloads. The method and system operate 
by establishing a first resource pool that specifies re- 
quirements for different computer system resources. 
Next, the different computer system resources are allo- 
cated to one or more resource pools, including the first 
resource pool, to create a resource allocation, wherein 
requirements of the first resource pool are satisfied, and 
wherein resources allocated to the first resource pool 
can change over time. A first process is then bound to 
the first resource pool, so that the first process has ac- 



cess to the plurality of different computer system re- 
sources allocated to the first resource pool. In one par- 
ticular embodiment, while allocating different computer 
system resources, the computer system resources are 
partitioned into one or more partitions, wherein a first 
partition is associated with a first resource and a second 
partition is associated with a second resource. The first 
partition is then allocated to a single resource pool, so 
that only processes associated with the single resource 
pool can access the first partition. At the same time, the 
second partition is allocated to multiple resource pools 
so that processes associated with the multiple resource 
pools can share the second partition. 
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Description 

Field of the Invention 

5 [0001] The present invention relates to allocating resources within a connputing system, for example, between dif- 
ferent concurrently executing workloads. 

Background of the invention 

10 [0002] The advent of computer networks has led to the development of server computer systems that perform com- 
putational operations on behalf of numerous client computer systems. These server computer systems are typically 
configured with large amounts of computing resources, such as processors and memory, and are typically employed 
in processing one or more concurrently executing computational workloads. 

[0003] One challenge in designing an operating system to manage such a server is to ensure that computer system 
15 resources are allocated between computational workloads so that the minimum requirements of each wori<load are 
satisfied, and so that the workloads are collectively executed in an efficient manner. 

[0004] Some modern computing systems provide support for partitioning a machine-wide resource into smaller sets 
and then associating one or more workloads with each of the sets. For example, the SOLARIS'^'^ operating system, 
distributed by SUN Microsystems, Inc. of Palo Alto, California, allows processors to be grouped into processor sets, 
20 whereinspecificprocessesmay be bound to a specific processor set. in this way, the specific processes do notcompete 
with other processes for access to the specific processor set. 

[0005] However, these partitioning operations must presently be specified manually by a machine operator and are 
dependent upon the specific machine configuration, as well as the operator's awareness of resource requirements for 
excepted workloads. Furthemnore, a given allocation of computer system resources is not persistent across machine 
25 failures. 

[0006] Other operating systems have developed a mechanism for assembling a group of resources into a fixed 
"container" that processes can bind to in order to access the resources. However, resources within a fixed container 
cannot be flexibly changed over time to accommodate changing resource requirements for the various system work- 
loads. Furthermore, resources cannot be shared between containers. 

30 

Summary of the Invention 

[0007] Accordingly, one embodiment of the present invention provides a system and method that allocate computer 
system resources between concunently executing woricloads. A first resource pool Is established that specifies re- 

35 quirements for different computer system resources. Next, the different computer system resources are allocated to 
one or more resource pools, including the first resource pool, to create a resource allocation, wherein requirements of 
the first resource pool are satisfied, and wherein resources allocated to the first resource pool can change overtime. 
A first process is then bound to the first resource pool, so that the first process has access to the plurality of different 
computer system resources allocated to the first resource pool. Such an approach therefore allows computer system 

40 resources to be allocated between different concurrently executing workloads. 

[00O8] In one particular embodiment of the present invention, while allocating different computer system resources, 
the computer system resources are partitioned into one or more partitions, wherein a first partition is associated with 
a first resource and a second partition is associated with a second resource. The first partition is then allocated to a 
single resource pool, so that only processes associated with the single resource pool can access the first partition. At 

45 the same time, the second partition is allocated to multiple resource pools so that processes associated with the multiple 
resource pools can share the second partition. In this way, non-critical resources can be shared, while other resources 
deemed critical to the successful execution ol a workload are not shared. 

[00O9] In one embodiment of the present invention, prior to allocating the different computer system resources, it is 
verified that the collective requirements of the one ormore resource pools can be satisfied. If the collective requirements 
so cannot be satisfied, the system signals an error condition. 

[0010] In :;ne embodiment of the present invention, establishing the first resource pool involves selecting a file con- 
taining a representation of the first resource pool from a number of possible files. 

[001 1] In one embodiment of the present invention, a representation of the resource allocation is saved to non-volatile 
storage so that the resource allocation can be reused after a machine failure. This can be achieved, for example, by 
55 using an Extensible Markup Language (XML) representation of the resource allocation. One possibility is to store a 
representation of each of the one or more resource pools along with associated resources. 

[0012] In one embodiment of the present invention, the first resource pool is associated with a first project, and the 
first process is one of a plurality of processes associated with the first project. 
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[0013] In one embodiment of the present invention, establishing the first resource pool involves establishing minimum 
and maximum requirements for a given resource. 

[001 4] In one embodiment of the present invention, the system dynamically adjusts the resource allocation dunng 
6X6cution 

5 [00151 In one embodiment of the present invention, the different computer system resources can include, central 
processing units, semiconductor memory, swap space and networking resources. 

Brief Description of the Figures 

10 [001 6] Various preferred embodiments of the invention will now be described in detail, by way of example only, with 
reference to the following drawings, in which like reference numerals identify like elements: 

FIG 1 illustrates a distributed computing system in accordance with one embodiment of the present invention. 
FIG. 2 illustrates how computer system resources are allocated to resource pools in accordance with one embod- 
15 imentof the present Invention. 

FIG 3 illustrates the structure of a resource pool in accordance with one embodiment of the present invention. 
FIG. 4 Illustrates how processes are associated with projects in accordance with one embodiment of the present 

Invention. . .... l. ^ . 

FIG. 5 is aflow chart illustrating the process of setting up a resource allocation in accordance with one embodiment 

so of the present invention. -.u 

FIG. 6 is a flow chart illustrating the process of storing a resource allocation to a file in accordance with one 

embodiment of the present invention. 



25 



■ Detailed Description 
Distributed Computtng System 



[00171 FIG. 1 illustrates a distributed computing system 100 in accordance with one embodiment of the present 
invention. Distributed computing system 1 00 includes a collection of client computing systems 1 02-1 04 that are coupled 

30 to a server computing system 108 through a network 106. 

[0018] Clients 102-1 04 can generally include any device having computational capability and a mechanism tor com- 
municating across the network. Server 1 08 can generally include any computing device having a mechanism for serv- 
icing requests from clients 102-104 for computational and/or data storage resources. For example, clients 102-104 
and server 108 may be a computer system based on a microprocessor, a mainframe computer, a digital signal proc- 

35 essor, a portable computing device, a personal organizer a device controller, a computational engine within an appli- 

ance, and so on. . ^ 

[001 91 Clients 1 02-1 04 and server 1 08 include central processing units (CPUs) that execute threads. Threads are 
entities that generate a series of execution requests, while CPUs are entities that can satisfy the execution requests. 
[0020] Network 1 06 can generally include any type of wired or wireless communication channel capable of coupling 

40 together computing nodes. This includes, but is not limited to, a local area networic, a wide area network, or a combi- 
nation of networks. In one embodiment of the present invention, network 1 06 includes the Internet. 
[00211 Sewer 108 includes an operating system 110 that supports flexible resource pools, which can be dynamically 
modified during system operation in accordance with an embodiment of the present invention. 
[0022] Note that although the present invention is described in the context of a server computer system, it is not 

45 limited to a sewer computer system. In general, the present invention is applicable.to any computer system that allo- 
cates computational resources to different computational workloads. 

Allocation of Resources to Pools 

so [0023] FIG. 2 illustrates how computer system resources are allocated to pools 220-222 in accordance with one 
embodiment of the present invention. As is illustrated in FIG. 2. seiver 108 contains various computational resources, 
including central processing units (CPUs) 202. memory 204. swap space 206, nelwori< interfaces 208 and scheduling 
classes 21 0 CPUs 202 include one or more CPUs within server 1 08. Note that It is possible to allocate an entire CPU 
to a pool or alternatively, a fraction of a CPU. Memory 204 includes the main memory resources of server 1 08. Swap 

55 space 206 includes disk space that is used as a backing store for the virtual memory system of server 1 0B. Network 
interfaces 208 include different channels for connecting server 1 08 with network 1 06. Note that network resources can 
alternatively be partitioned by allocating "available networi< bandwidth" instead of individual network interfaces. 
[0024] Scheduling classes 2 1 0 are not actually system resources for which processes contend, but they can similarly 
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be allocated to pools. For example, in FIG. 2, time-sharing schedu.er211 Is assigned to pool 220. proportional share 
scheduler Is assigned to pool 221 and real-time scheduler 212 is "nass.gned^ ^^^^ 
t0025] AS is illustrated ir, FIG. 2. some of the resources wrth.n server ^PUs while pools 

resources are allocated to pool 221 . Note that both pool 220 ^^^P^'^'.^.l^''^^ fnT4 way non-crmcal system 
5 220 and 221 do not share memory 204, swap space 206 or '^'^'^^J^^^^^^^Te^^^iou, are not shared, 
resources can be shared, while other resources, deemed ^^^^.^^^^^ J^^^^f^^^^^^^^ which leads 

This is an important advantage because sharing resources gwes rise to more efficient resource ui 

iS'"s;sTsS;eTFS..2.server108al^ 
10 that are not assigned to specific pools. 

Structure of Resource Pool 

10027] FIG. 3 illustrates the stn^cture of resource poo, 220 In accordance v^^^^ :Z:Z"^ZT;l'^Z^^^^^^ 
lention. Resource pool 200 includes references to '^'^^^^^y^^^'^!;^^^^^^^^^^ group 308 and a 

a reference to a memoiy set 304, a reference to a swap set 306, a ^^J^ra^oTbetleen pool 220 and its re- 
reference to a soneduling dass 31 0. These references keep track of the associations oeiw v 
sources. These references are indicated by arrows in FIG. Z ^^„^„,e the reference to processor set 302 

10028] Eachoftherereferencespolntstoaresourcedaiastructure. ^^^^^^^'^^^^^^^^^ incLing a list of the 

points to resource data structure 320, Resource data strucnure 320 "=^^^^4^, .^^"'^^^^^^ resources, such 

processor units 322 assigned to the resource. (Note '^^^^^^^^^^^^ rTnge. such as memory, 

as processors, that are allocated in discrete unrts. For other resources that ^Uocat 9 ^^^^ ^ ^ 
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each resource that is specified in the file (step 514). This can be accomplished by first meeting the minimum require- 
ments for each pool, arid then using a card dealing algorithm to dole out additional resources. It the systenn fails dunng 
this partitioning process, the system signals an error condition and terminates (step 512). 

[0035] Once the resources are successfully partitioned, the system associates pools with the partrtions {step 516). 
The system then binds each process to a specific ppolthat is associated with the process (step 51 8). In one embodiment 
of the present invention, this involves looking up a project that is associated with the process, and then looking up the 
pool that is associated with the project. Next, the System binds the process to each resource within the pool (step 520). 
[0036] Note that this allocation of resources to processes is merely an initial allocation. This allocation can change 
over time as the system dynamically adjusts resource allocations based upon changing workload requirements (step 
522). 

Process of Storing a Resource Allocation 

[0037] FIG. 6 is a flow chart illustrating the process of storing a resource allocation to a file in accordance with one 
embodiment of the present invention. (Note that although the system Illustrated uses an XML representation for re- 
sources and pools, in general any representation can be used, and the present invention is not limited to an XML 
representation). . 

[0038] The system processes each resource in tum. For each resource, the system assigns a unique identifier to 
the resource (step 602) , and then enumerates properties of the resource (step 604) . Next, the system creates a resource 
node with properties as child nodes (step 606). The system then transforms this resource-property tree into an XML 
tag containing property sub-tags (step 608). 

[0039] Next the system processes each pool in tum. For each pool, the system identifies dependent resources by 
a unique identifier (step 610). The system then transfomns the pool, along with ID-based resource references, into an 
XML pool tag containing references as attributes (step 612). 

[0040] Next, the system commits the XML representation of the resources and the pools to a designated file (step 
614) This allows the resources and pools to be reconstituted after a system failure. 

[0041] After the configuration file is created, the system administrator can replicate the configuration file across 
multiple machines in order to guarantee a stable configuration. There may be some minor edits required (e.g. CPU 
names may differ, board names may differ), but the configuration is largely stable and very portable. Also, the config- 
uration is persistent across reboots. 

[0042] Furthermore, note that the configuration can be easily amended with small edits , such as altenng the maximum 
amount of resource in a set. or major edits, such as the complete removal of all pools on the system. 
[0043] Moreover the above-described model is flexible. A system administrator may choose to bind multiple pools 
to a single resource, or may bind only one pool to a partition and thus provide guaranteed control of the partition for a 
pool. The system administrator may even leave a resource completely unutilized by associating no pools with the 
partition, thereby leaving the resource as an "emergency standby partition" 

[0044] With the above-described model, shifting workload is very easy. It simply involves associating the pool with 
a different set of resources. Furthermore, one or multiple resource sets may be changed, and the resource sets can 
be changed many times over the lifetime of the pool. ^ . 

[0045] Additionally, since the configuration document is XML, the configuration can be transformed into alternative 
formats easily, and can thus be re-used by an XML-aware application that requires pool-related infomnation. For in- 
stance, a pool monitoring application can read the dynamic XML configuration file and report the current configuration 
as an l-fTML document or as a standard output text file. 

Example Configuration File 

[0046] The sample configuration file that appears below illustrates how resources and pools for a particu lar host can 
be represented in XML. Elements that contain other elenr»ents (for instance, processor^rset contains cpu) represent a 
containment relationship between those elements. Also, there are association relationships, which represent relation- 
ships where elements require access to an uncontained element. For instance, pool elements have a 
resource_processor_rset attribute whteh references a defined processor.rset element. 
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<7xinl versioTi="l.C"?> 

<lDOCTYPE pool^conf n //cxt" 

PUBLIC "-//Sun Microsystems lnc//DTD Resource Management AU//bN 
"filc:///usr/share/lib/xml/dtd/nn_all.dtd"> 

<pool conf> 

<processor_^rsetname-"default" default retid="3452 157 > 
<cpu id="0»' refjd«"23 13243" /> 
<cpu id="r' refJd-"7568334" /> 
<cpu id="2" refjd="6725923" /> 
<cpu id="3" refJd=M786376" /> 

<mrm"?°SntTne="defauir default ref_id="709 1674" unit="MB.. size="2048" /> 
<processor_Tsetname="small-0" id="0" ref_id="4845581"> 

<cpu id="4" ref_id="521942]" /> 

<cpu id="5" refjd="6957092" /> 

<cpu id="6" ref_id="7951354" l> 

<cpu id="7"' ref_id="3812561" l> 

^^°;ro°>«Lne="sn.al.-l^ <cpu id="8" ref_id="7900695" 

/> 

<cpuid="9" refJd="7716369"/> 
<cpu id="10" ref.id="832i533" l> 



<cpu id="l 1" ref_id="4773559" /> 
</processor rset> 

<processorJrsetname="large-0" id="2" refjd="6841430»> 
<cpu id="12" ref_id="5596008" l> 
<cpu id="13" ref_id="4675903" l> 
<qju id="M" ref_id="6997070" l> 
<cpu id="15" ref_id="7944641" /> 
<cpu id="16" ref_id="509l552" l> 
<cpu id="17" ref_id="1401062" l> 
<cpu id="18" ref_id="3872070" l> 
<cpu id="19" ref_id="6022338" t> 

</processor rset></processor_rset> . 

<memoty rset n^e="niediuin-0" id="l" refJd-8701782" unit= W 1024 . I> 
<memo%et name="mediuiT.-l" id="2" refjd=" 1659240" umt-JlB" size- 1024 /> 
<n.emo?-rsetname-"s-mall-0" id-3" refJd-3981018" umt="MB J , 

<pool nari^e="web_inarketing" ref_id="3594665" resource_processor_rset- 4845581 
resource memory_rset="8701 782" importance=" 10" /> ...^coAXQn" 
<pool naine="web_salcs" ref_id="9338378" resource_processor_rsct- 6520690 
resource memory_rset=" 1 659240" iinportance=" 10" /> „»t-.-fiR4i4^0" 
<pool n^e^-applmarketing" refjd=" 67 84973" resource j)rocessor_rset= 6841430 
resource_meinory_rset="3981018" unportance="20" l> 
</pool_conf-> 

[0047] Note that the data structures and code described herein are typically stored or, ^^^^^^J^^f^'^^^^^^^ 

medium may include a communications network, such as the Intemet. illustration 
[0048] In ^.nclusion. it will be appreciated that the different embodiments described above are by way of illustration 
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only, and not by way of limitation. Thus various modifications and adaptations of these embodiments will be apparent 
to the skilled person, and remain within the scope of the invention as specified by the following claims and their equiv- 
alents. 

5 

Claims 

1 . A method for allocating computer system resources between concun-ently executing workloads, comprising: 

10 establishing a first resource pool that specifies requirements for each of a plurality of different computer system 

resources; 

allocating the plurality of different computer system resources to one or more resource pools, including the 
first resource pool, to create a resource allocation, wherein the requirements of the first resource pool are 
satisfied, and wherein resources allocated to the first resource pool can change overtime; and 
15 binding a first process to the first resource pool, so that the first process has access to the plurality of different 

computer system resources allocated to the first resource pool. 

2. The method of claim 1 . wherein allocating the plurality of different computer system resources to one or more 
resource pools involves: 

20 

partitioning each of the plurality of different computer system resources into one or more partitions, wherein 
a first partition is associated with a first resource and a second partition is associated with a second resource; 
allocating the first partition to a single resource pool, so that only processes associated with the single resource 
pool can access the first partition; and 
25 allocating the second partition to multiple resource pools so that processes associated with the multiple re- 

source pools can share the second partition. 

3. The method of claim 1 or 2, wherein prior to allocating the plurality of different computer system resources, the 
method further comprises: 

30 

verifying that collective requirements of the one or more resource pools can be satisfied: and 
if the collective requirements cannot be satisfied, signaling an error condition. 

4. The method of any preceding claim, wherein establishing the first resource pool involves selecting a file containing 
35 a representation of the first resource pool from a plurality of possible files. 

5. The method of any preceding claim, further comprising storing a representation of the resource allocation to non> 
volatile storage so that the resource allocation can be reused after a machine failure. 

40 6. The method of claim 5, wherein storing the representation of the resource allocation involves storing a represen- 
tation of each of the one or more resource pools along with associated resources. 

7. The method of claim 5 or 6. wherein storing the representation of the resource allocation Involves storing an Ex- 
tensible Markup Language (XML) representation of the resource allocation. 

45 

8. The method of any preceding claim. 

wherein the first resource pool is associated with a first project; and 

wherein the first process is one of a plurality of processes associated with the first project. 

so 9. The method of any preceding claim, wherein establishing the first resource pool involves establishing minimum 
and maximum requirements for a given resource. 

10, The method of any preceding claim, further comprising dynamically adjusting the resource allocation during system 
execution. 

55 

11. The method of any preceding claim, wherein the plurality of different computer system resources can include: 

central processing units; 
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25 



semiconductor memory ; 
swap space; and 
networking resources. 

method comprising: 

establishing afirst resource pool that specifies requirements for each of a plurality of different computer system 

computer system resources allocated to the first resource pool. 
13. An apparatus that allocates computer system resources between concurrently executing workloads, comprising: 
ar. establishment mechanism that Is configured to establish a first resource poo. that specifies requirements 

nas ac<is to me plurally ol dlfl.rent ooorpLler tysl.ti r..ourc.! .Ilocated to th. W rosouice pool. 
14. The apparatus d dalm 13. «ti6rein the allooallon .neonazism is conllgurecl to: 

patstion ea* o- the plutauty o. d»,eren, co.™,uter system res»»o» '^"^^^^^^'^J^^Zi. 

S=er;pll^Xt".:.ou^P«*»»atp,.ce«esas..cia.e.«nt.e,™^^^^ 
35 pools can slf^are the second partition. 

,6. -me apparffl,^ ol dam, 13 or 14. tlte apparatus .daltionally " 

"4=:rL-^'.=:.r=nSrs==rr^- 

40 an error condition. 

16 The apparatus of any of claims 13to 15. wherein the establishment mechanism is configured to select afile con- 
taining a representation of the first resource pool from a plurality of possible files. 

" -=2rrra;,or.r=^^^^ 

a machine failure. 

18. Theapparatusofclaim17.whereinthearchivingmechanismisconfiguredtostorearepresentationofeachofthe 
so one or more resource pools along with associated resources. 

19. The apparatus of claim 17 or 18, wherein the archiving mechanism is configured to store an Extensible Markup 
Language (XML) representation of the resource allocation. 

55 20 Tine apparatus of any of claims 13 to 19, 

wherein the first resource pool is associated with a first project; and 

wherein the first process is one of a plurality of processes associated with the first project. 
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21. The apparatus of any of claims 13 to 20. wherein the establishment mechanism is configured to establish minimum 
and maximum requirements tor a given resource. 

22. The apparatus of any of claims 13 to 21, further comprising an adjustment mechanism that is configured to dy- 
5 namlcally adjust the resource allocation during system execution. 

23. The apparatus of any of claims 13 to 22, wherein the plurality of different computer system resources can include: 

central processing units; 
10 semiconductor memory; 

swap space; and 
networking resources. 

24. A computer program comprising instructions that when executed by a computer.cause it to perfomn the method of 
15 any of claims 1 to 11 . 
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